Research on Web Information Extraction Based on Combining Block Importance Model and Xpath

doi:10.3969/j.issn.1006-2475.2009.08.020

Computer and Modernization ›› 2009, Vol. 8 ›› Issue (8): 73-75,7.doi: 10.3969/j.issn.1006-2475.2009.08.020

• 信息系统 • Previous Articles Next Articles

Research on Web Information Extraction Based on Combining Block Importance Model and Xpath

PANG Qiu-ben,GU Ping,YANG Xiao-mei

School of Computer, Electronics and Information, Guangxi University, Nanning 530004, China

Received:2008-08-29 Revised:1900-01-01 Online:2009-08-21 Published:2009-08-21

Abstract

Abstract: Approaches of page segment reduce the unit of Web information extraction from page to block. This paper studies the main approaches of page segment and the basedlearning block importance model, and analyses the approach of Xpathbased Web information extraction. Combining the advantages of the two approaches, this paper proposes a new Web information extraction based on combining block importance model and Xpath, discusses its design process, and gives its formalized description and experimental result. The result shows that this approach is fit for extracting from the Web which has many records.

Key words: page segment, value of block importance, Xpath, Web information extraction

CLC Number:

TP391.1

PANG Qiu-ben;GU Ping;YANG Xiao-mei. Research on Web Information Extraction Based on Combining Block Importance Model and Xpath[J]. Computer and Modernization, 2009, 8(8): 73-75,7.

Research on Web Information Extraction Based on Combining Block Importance Model and Xpath

Knowledge

Abstract

Cite this article

share this article

References

Related Articles 1

Recommended Articles

Metrics

Comments